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Abstract. Physical processes rarely occur in isolation, rather they in¬ 
fluence and interact with one another. Thus, there is great benefit in 
modeling potential dependence between both spatial locations and dif¬ 
ferent processes. It is the interaction between these two dependencies 
that is the focus of Genton and Kleiber’s paper under discussion. We see 
the problem of ensuring that any multivariate spatial covariance ma¬ 
trix is nonnegative definite as important, but we also see it as a means 
to an end. That “end” is solving the scientific problem of predicting a 
multivariate field. 
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Physical processes rarely occur in isolation, rather 
they influence and interact with one another. Thus, 
there is great benefit in modeling potential depen¬ 
dence between both spatial locations and different 
processes. It is the interaction between these two 
dependencies that is the focus of GK. 

We see the problem of ensuring that the matrix 
given in GK-(2) is nonnegative definite (n.n.d.) as 
important, but we also see it as a means to an 
end. That “end” is solving the scientific problem 
of predicting a multivariate field of, say, tempera¬ 
ture and rainfall, based on noisy and spatially in¬ 
complete data from weather stations in a region of 
interest. There is also scientific interest in the be¬ 
havior of the measures of cross-spatial dependence 
(e.g., cross-covariance functions), but usually spatial 
prediction is the ultimate goal. 

Of course, an interim goal is estimation of the 
means, covariances and cross-covariances, but not a 
lot of GK’s review was devoted to this. The nonpara- 
metric estimators given by GK-(6) and GK-(ll) are 
useful for recognizing which parametric class of valid 
cross-covariance functions might represent the mul¬ 
tivariate spatial dependence in the data. Estimation 
of the parameters in this class is usually obtained by 
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weighted least squares or maximum likelihood. Opti¬ 
mal spatial prediction in practice proceeds by substi¬ 
tuting these parameter estimates into the model and 
computing the optimal data weights as if the param¬ 
eters were known. Because of this, predictors and 
their standard errors are biased. These and other is¬ 
sues (e.g., change-of-support) are well known in the 
univariate spatial setting, and they clearly also arise 
in the multivariate spatial setting. 

One problem that arises in multivariate spatial 
statistics, but that is not discussed very much by 
GK, is collocation (or not) of spatial data from 
the different variables. This might be viewed as a 
missing-data problem, for which a hierarchical mul¬ 
tivariate spatial statistical model offers a path for¬ 
ward. Being hierarchical does not necessarily mean 
being Bayesian, as the next section on latent mod¬ 
eling demonstrates. 

2. THE LATENT PROCESS IS WHERE THE 
SPATIAL DEPENDENCE IS USUALLY 
MODELED 

The multivariate spatial models in GK do not 
account for the measurement error that exists for 
all physical observations. Their multivariate spatial 
processes are written as (Z(s) : s G M d }, and valid 
spatial-covariance models {C(s, u):s,u € M d } are 
constructed for them. GK’s models are almost all 
“smooth,” in the sense that 

(1) lim C(s,u) = C(s,s). 

u—>-S 

However, an observation or potential observation 
is observed with error, since no measuring instru¬ 
ment is perfect. Therefore, if the observations are 
Z(si),..., Z(s n ), then there is a hidden (or latent) 
process {Y(s) : s € D} such that 

(2) Z(sj) = Y (sj) + e(sj), i = l,...,n, 

where e(sj) has mean zero and covariance matrix 
cov(e(sj)) = S e (sj). Further, the measurement pro¬ 
cess is independent of the latent process, and it 
is usually reasonable to assume that it is inde¬ 
pendent from one observation to another, that is, 
cov(e(sj), e(sj)) = 0, for i / j. 

It appears that GK build multivariate spatial co- 
variance models for the latent process, yet the def¬ 
inition of C(h) in GK-(6) is based on observations 
that always come with measurement error. Hence, 
C(h) is estimating a C^(h) that satisfies 

(3) Cz(0) — lim Cz(h) is n.n.d. 

h->0 


The difference of the two matrices above is in fact 
the measurement-error covariance matrix, which we 
denote as E e (0) in the stationary case [i.e., where 
C(s, s + h) depends only on h and Y £ (s) = 5I e (0) 
for all s]. 

This mismatch between (3) and the stationary 
version of (1), namely, linih-m C(h) = C(0), can 
be resolved once one realizes that GK are really 
building models for a latent Y-process, and that 
a full multivariate spatial covariance function for 
the Z-process (i.e., the observations) is obtained 
by additionally modeling the measurement-error co- 
variance, 5] e (0). In their second example (GK- 
Section 6.2), GK recognize the need for Y £ (0): “Due 
to the fact that the data are observational, we aug¬ 
ment each process’ covariance with a nugget effect.” 
However, there are potentially nonzero off-diagonal 
terms in H £ (0). Notice that the two variables in GK- 
Section 6.2 are observed maximum and minimum 
temperatures obtained from the same instrument 
at location sj, say, and, hence, the measurement er¬ 
ror for the maximum, ei(sj), and the measurement 
error for the minimum, £2 (s,), should be correlated. 
GK’s choice of a diagonal matrix for S £ (0) does not 
reflect this. 

While it may not be obvious, there are also cir¬ 
cumstances where a “measurement error” compo¬ 
nent is needed when modeling deterministic spatio- 
temporal output from computer experiments, such 
as those used in GK-Section 6.1. This component is 
actually a spatio-temporal interaction that “hides” 
the latent spatial process (Kang and Cressie (2013)). 

Finally, it is possible that a latent Y-process is 
itself not “smooth.” In this case, we can write the 
latent process as 

(4) Y(s) = W(s) + £(s), s eD, 

where the W-process is smooth [i.e., satisfies (1)] 
and where the ^-process is independent of the W- 
process and has mean zero. Often, it is assumed that 
£(s) is independent of £(u) for any s/u and, when 
u = s,cov(£(s),£(u)) = £ € (0). 

When modeling univariate spatial data, there has 
been considerable inconsistency in the literature re¬ 
garding how to handle the ^-process (micro-scale 
variation) and the e-process (measurement-error 
process). That confusion should also be avoided in 
the multivariate spatial setting. We need both 51^(0) 
and S £ (0) for different purposes, and these different 
roles should be accounted for: We wish to filter out 
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the e-process (since it is extraneous to the true, hid¬ 
den Y-process), but we wish to predict the ^-process 
(since it represents the scientific process at micro¬ 
scales) . The presence of both processes is manifested 
in C(h), for h near 0, but without more information 
than that supplied by the multivariate spatial data, 
the ^-process and the e-process are confounded. 

3. ESTIMATE USING CROSS-VARIOGRAMS, 
THEN PREDICT USING 
CROSS-COVARIANCES 

A small amount of GK’s review of cross-covariances 
discusses cross-variograms and generalized covari¬ 
ance functions, since they are stationary when the 
process {Z(s):s£fi} is differenced. Before differ¬ 
encing, the process is nonstationary. There are a 
number of ways to do the differencing in a mul¬ 
tivariate context, leading to a lack of agreement 
among researchers of how to capture the cross¬ 
dependence between the processes {Z g (s):s £ D} 
and {Z r ( u) : u £ D}, 1 < q ^ r < p. For many scien¬ 
tific purposes, the key goal is optimal multivariate 
spatial prediction. Therefore, the key measure of 
multivariate spatial dependence should be one that 
can be used, without fail, in kriging and co-kriging 
(i.e., spatial prediction) equations. That is, the opti¬ 
mal weights in the linear combination of the spatial 
data, { Z q (s q i ) :i = 1 ,... , n q , q = 1 ,... ,p}, should de¬ 
pend on this measure. If a measure sometimes yields 
nonoptimal weights, we suggest that it is not as in¬ 
teresting as one that does. While GK-(l) and GK- 

(4) yield optimal weights, the (covariance-based) 
cross-variograms given by GK-(3) do not always. 

Ver Hoef and Cressie (1993, 1994) give an exam¬ 
ple where use of GK-(3) in spatial multivariate pre¬ 
diction yields nonoptimal weights. Indeed, it is GK- 

(3) that should have been tagged “pseudo” in the lit¬ 
erature, not GK-(4). The article that gives the most 
general multivariate spatial dependence measure 
that is a function of h = s — u is Kiinsch, Papritz 
and Bassi (1997), who define the generalized cross¬ 
covariance functions. Certainly, researchers’ famil¬ 
iarity with these more general forms of stationary 
cross-dependence is not high, but the interpretation 
of the appropriate cross-variograms given by GK- 

(4) is not difficult (Cressie and Wikle (1998), Huang 
et al. (2009), Kiinsch, Papritz and Bassi (1997), Ma- 
jure and Cressie (1997)). 

We have found that for univariate spatial pro¬ 
cesses, the estimation of spatial-dependence param¬ 
eters is achieved more stably through the variogram 


than the covariance function (Cressie (1993), Sec¬ 
tion 2.4.1). On the other hand, because optimal spa¬ 
tial prediction using a valid covariance function can 
be used without fail (Cressie and Wikle (2011), Sec¬ 
tion 4.1.2), we recommend the following inferential 
strategy for multivariate spatial processes: Put the 
cross-variogram at the core of parameter estimation 
and the cross-covariance function at the core of op¬ 
timal multivariate spatial prediction. 

We conclude that, for multivariate spatial pro¬ 
cesses, the appropriate cross-variograms given by 
GK-(4) have great potential for the purpose of esti¬ 
mation, and there is a need for a research program 
to pursue the interpretation and robust estimation 
of GK-(4), but not of the inappropriate GK-(3). 

4. CAPTURING SPATIAL DEPENDENCE: 

A CONDITIONAL APPROACH 

Let [•] denote the probability distribution of 
the argument within square brackets. GK tackle 
cross-covariance construction from the perspective 
of the joint distribution, The con¬ 

ditional approach to constructing cross-covariance 
functions writes the joint distribution as the prod¬ 
uct, [Z 2 (-)\Zi(-)][Zi(-)\. Consider the space D dis¬ 
cretized onto a fine-resolution grid, {si,...,s n }, 
such that the processes Z\{-) and Z-^i-) are repre¬ 
sented as n-dimensional vectors Zi and Z 2 , respec¬ 
tively. In practice, this is how a continuously indexed 
process is represented in a computer program. Then, 
from Cressie and Wikle (2011, page 160), the con¬ 
ditional approach yields the bivariate spatial model, 

(5) cov(Z2) = S 211 + BSnB^ 

(6) cov(Zi,Z 2 ) = EuB', 

(7) cov(Zi) = E n , 

where S 2 |i and Sn are n.n.d. matrices obtained 
from univariate spatial processes, and the n x n ma¬ 
trix B of real-valued entries is unrestricted. Crit¬ 
ically, the joint matrix, cov((Z / 1 , Z^/), is always 
n.n.d. 

Equations (5)-(7) are obtained from 


(8) 

E(Z 2 |Zx) 

= BZ! 

(9) 

cov(Z 2 |Zi) 

= S 2|l: 

(10) 

cov(Zi) 

= Eli, 


which only involve univariate spatial processes. It 
should be emphasized that the conditioning in (8) 
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and (9) is on the whole process Z\. In contrast to 
what has been stated elsewhere (Banerjee, Carlin 
and Gelfand (2015), page 273), there is no attempt 
in the conditional approach to build a joint distri¬ 
bution solely from Z 2 (si)\Zi{si), for i = 1,... ,n. In¬ 
deed, 


[£ 2 (Si)|Zl(Si)] 


\Z 2 (sj) |Z,] [Zi] 

[^l(Si)] 




where 


dZi t _j 

= dZi(si) • • • dZi(si_i)dZi(s i+ i)--- dZi(s„). 

The order of the variables Z\ and Z 2 in the con¬ 
ditional approach is a choice that is generally driven 
by the underlying science (e.g., Royle et al. (1999)). 
When more variables are involved, the order may 
not always be obvious, but, if the goal is to construct 
valid covariance and cross-covariance functions, the 
different orderings can be viewed as enlarging the 
space of valid models. 


5. CAPTURING SPATIAL DEPENDENCE 
THROUGH “FACTOR” PROCESSES: A JOINT 
APPROACH 

Any univariate covariance function, C(s,u), that 
satisfies mild integrability conditions has a Karhunen- 
Loeve representation (Papoulis (1991)): 

OO 

C(s, u) = £ A a P a (s)P a (u), 

a= 1 

where Ai > A 2 > • • • > 0 and {P a (-) :a = 1,2,...} are 
orthogonal eigenvectors obtained by solving a Fred¬ 
holm integral equation. After truncation, the func¬ 
tion, 

b 

(11) C (6) (s,u) = ^A 0 P a (s)P a (u), 

a— 1 

is still n.n.d. Indeed, an equivalent way to write (11) 
is in terms of a spatial process, 

b 

(12) Z (b) (s) = /i(s) + ^77 a P a (s), 

a= 1 

where rj = (r/i ,..., rjb)' is a mean-zero random vector 
with cov(rj) = diag(Ai,..., A{,). 

Because the original covariance has been trun¬ 
cated, one way to capture the lost covariation is to 
add back a simple random process: 

b 

(13) Z(s) = fi(s) +^2r]aPa(s) + f(s). 

a =1 


It is common to choose £(■) to be a white-noise 
process, but it is also straightforward to maintain 
some spatial structure in £(•) (Berliner, Wikle and 
Cressie (2000)). Clearly, the expression (13) could be 
thought of as a spatial-factor-analysis model (Chris¬ 
tensen and Amemiya (2001), Lopes, Salazar and 
Gamerman (2008)), although there are important 
differences in what is assumed known and what is 
estimated. 

The definition (13) is directly expressed in terms 
of the random components of the model, and it is a 
very fertile way of constructing covariance functions: 
Specifically, replace {P a ( - )} with any set of known 
basis functions {S' a (-)}, orthogonal or not; replace 
cov(? 7 ) = diag(Ai,..., A&) with any bxb positive def¬ 
inite (p.d.) matrix K; and write cov(£(s),£(u)) = 
cr|/(u = s). Then 

(14) C(s, u) = S(s) / KS(u) + u|/(u = s) 

is a valid nonstationary univariate covariance model, 
where S(-) = (Si(-),..., Sb(-)) 1 ■ Cressie and Johan- 
nesson (2008) call this a Spatial Random Effects 
(SRE) model. 

The generalization of (14) to multivariate spatial 
processes is easiest to obtain from its expression in 
terms of random components. Here, the bivariate 
case shows its potential: 

Zi(s) = S (1) (s)'? 7 i+ ^i(s), seD, 

(15) 

Z 2 (s) = S ( 2 ) (s)'?7 2 + £ 2 (s), s<E D, 

where S^(-) and S^(-) are given spatial ba¬ 
sis functions that are quite likely to be differ¬ 
ent for Z i(-) and for Z 2 (-), and and r ] 2 may 
have nonzero cov(r7 1 ,r/ 2 ); see Bradley, Holan and 
Wikle (2015). Note that (15) could be viewed as 
an errors-in-variables parameterization (Christensen 
and Amemiya, 2002, 2003). Clearly, the implied co- 
variance and cross-covariance functions are nonsta¬ 
tionary, but their parameters can still be estimated 
from the multivariate spatial data. 

Representations using “factor” processes, such 
as in (15), generalize many of the constructions 
outlined in GK-Section 2. Let {U r (-): r = 1,... ,p} 
be a set of independent univariate processes with 
mean 0, variance 1 and stationary correlation func¬ 
tions {p r (h)}. Suppose further that {g qr (h): q, r = 
1,... ,p} are integrable kernels; then a very general 
factor representation is 


(16) 



s)U r (u) du. 
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The cross-covariances implied by (16) are 

C qr ( h) 

= *52 Mvi)0rfc(V2H(vi - V 2 + h) dvi dv 2 . 

k= iV"' 

Clearly, asymmetry is present in all but the sim¬ 
plest cases, and the linear model of coregional¬ 
ization in GK-Section 2.1 is recovered by setting 
g qr ( u — s) = A qr 5{\i — s) in (16), where <5(-) is the 
Dirac delta function. The cross-covariance function 
given at the beginning of GK-Section 2.2 is also re¬ 
covered by setting p k ( h) = p(h)/p, g qk ( h) = k q ( h), 
and g r k{ h) = Ay(h). There are many familiar special 
cases, and the “factor” processes {U r (-)} in (16) do 
not even need to be independent. 

6. COMMENTS ON THE DATA EXAMPLES 

In GK-Section 6, several bivariate spatial models 
are implemented on data describing pressure and 
temperature. GK compare these models and assess 
which ones are best able to capture the dependence 
within and between the processes. This may be the 
first time such an exercise has been carried out; nev¬ 
ertheless, there are aspects of the analyses we would 
modify. 

First, the various models under consideration con¬ 
tain different numbers of free parameters. For ex¬ 
ample, the parsimonious Matern has six free pa¬ 
rameters, while the nonstationary parsimonious 
Matern has several hundred. In this context, the 
log-likelihood does not provide a useful comparison 
of model fit. We suggest that the Akaike Information 
Criterion (AIC) and its corrected version (AICc) are 
preferable when the number of free parameters dif¬ 
fer across models (Hoeting et al. (2006), Lee and 
Ghosh (2009)). 

Second, the other summaries (RMSE and CRPS) 
may not be indicative of model performance, since 
parameters estimated from the entire dataset were 
used. In our view, the flexibility, adaptability and 
utility of a model can only be assessed (in the con¬ 
text of cross-validation) using data that have not 
been used for parameter estimation. 

Finally, as we mentioned earlier (Section 2), the 
so-called nugget-effect matrix that consists of both 
measurement-error and micro-scale matrices needs 
to be modeled in both examples, and in the second 
example (GK-Section 6.2) the possibility of nondi¬ 
agonal contributions should be considered. 


7. BIBLIOGRAPHIC NOTES 

It is a big task to review multivariate geostatistics, 
and we appreciate that GK had limits on what they 
could cover. They selected a few topics that went 
beyond their core goal of reviewing cross-covariance 
functions and to some of these topics we add the 
following bibliographic notes. 

Nonstationarity for Factor Processes 

Wikle et al. (2010): Recall from Section 5 that, 
in the univariate setting, reduced-rank covariance 
functions are very useful for big spatial data: 

cov(Z(s), Z(u)) = S(s)'KS(u) + u(s)/(u = s), 

where S(-) is a given 6-dimensional (6 <C n) vector 
of spatial basis functions, K is an unknown 6x6 
p.d. matrix, and v(s) > 0. In Wikle’s review of these 
rank-6 covariance models, he points out their com¬ 
putational advantages for spatial prediction. A gen¬ 
eralization to multivariate covariance functions is 
straightforward; see Section 5. When considering 
global processes, such as in remote sensing appli¬ 
cations, the nonstationarity of these models is an 
advantage. 

Asymmetric Cross-Covariance Functions 

The asymmetry in multivariate spatial processes 
may come from, say, preferable mineralization of 
metals in an ore body or deposition of lighter partic¬ 
ulate matter in the environment. Cross-covariance 
models should be able to detect such phenomena. 
The “shifted-lag model” is a natural way to cap¬ 
ture sources of asymmetry and has a longer history 
than that indicated by the reference to Li and Zhang 
(2011) in GK-Section 5.1, as illustrated by the fol¬ 
lowing articles. 

Ver Hoef and Cressie (1993, 1994): The shifted- 
lag model given in GK-(12) was proposed. 

Majure and Cressie (1997): The shifted-lag model 
was estimated from variance-based cross-variograms. 

Christensen and Amemiya (2001, 2002): A latent 
variable factor analysis model for a multivariate spa¬ 
tial process was based on the shifted-lag model. 

Spatio-Temporal Covariance Functions 

From the point of view of building covariance 
functions, the temporal dimension could be viewed 
simply as an extra “spatial” dimension. However, an 
alternative approach, based on the dynamical evo¬ 
lution of spatial processes in time, often allows op¬ 
timal prediction to be carried out without explicitly 
constructing covariance models. 
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Wikle et al. (2001): This article uses conditional- 
probability modeling (in space and time) and reduced- 
rank models to achieve optimal spatio-temporal pre¬ 
diction. It bypasses the need for constructing spatio- 
temporal cross-covariance functions. 

Cressie and Wikle (2011): In much of this book, 
cross-covariance functions are considered as deriva¬ 
tive measures, from scientifically interpretable dy¬ 
namical (multivariate) spatial models; see their 
pages 418-425. A hierarchical dynamical approach 
is taken that yields optimal spatio-temporal predic¬ 
tion directly, without having to pass through multi¬ 
variate covariance-function modeling. 

In our view, hierarchical physical-statistical mod¬ 
eling of big, spatio-temporal, multivariate, nonlin¬ 
ear, non-Gaussian data will represent the next fron¬ 
tier. 
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